Starting Templeton

To start Templeton, you should first install the software and create your configuration file. After installing and configuring, you may start Templeton by running the Templeton executable at the command line.

When Templeton starts, it first loads in the default configuration files. These include:

  • .templetonrc in the /etc directory (Unix)
  • .templetonrc in the /etc/local directory (Unix)
  • .templetonrc, templeton.cfg, and templeto.cfg in the directory specified by the ETC environment variable
  • .templetonrc, templeton.cfg, and templeto.cfg in the directory specified by the HOME environment variable
  • .templetonrc, templeton.cfg, and templeto.cfg in the current directory
  • Additional configuration files may be specified as command line parameters.

    Running Templeton

    When Templeton starts, it displays its banner and lists all configuration files in the order that it loads them. If the interactive setting is not disabled by the configuration files, then Templeton will prompt for some initial information.
    Enter starting URL: This allows you to specify where Templeton should begin. The URL is in one of the forms:
  • http://host.domain/
  • http://host.domain:port/
  • http://host.domain/path/
  • http://host.domain:port/path/
  • http://host.domain/path/file
  • http://host.domain:port/path/file
  • Trailing slashes are optional, but should be used when applicable.

    For example, you can enter: http://www.intel.com or http://c.gp.cs.cmu.edu:5103/prog/webster or http://info.webcrawler.com/mak/projects/robots/robots.html

    Enter local path ["none" for log files only]: This command asks where retrieved files should be placed. You may either enter a path (i.e. D:\FILES\ or /tmp/retrieve) or the word "none". "None" informs Templeton not to retrieve files. If you operate your own web server, you may specify the root directory for that web server.
    Host restriction [yes|no|host|.domain]: This is the first restriction option. Templeton has the ability to retrieve from many machines, a few machines, or only one machine.
    • Entering "y" or "yes" tells Templeton not to leave the machine listed in the initial URL.
    • Entering "n" or "no" allows Templeton to retrieve from all machines that it comes across (follow all links) -- this is dangerous since (theoretically) it can retrieve THE ENTIRE WEB consisting of MILLIONS OF TERABYTES of data.
    • You may specify a specific host, such as "www.intel.com". You may also specify a port on the host, such as "www.intel.com:2345". If you do not specify a port, then the default HTML port 80 is assumed.
    • You may specify a domain suffex. Only machines with that suffex will be retrieved. For example, to retrieve only information from Texas A&M University, you would enter ".tamu.edu". If you want to further restrict it to the Computer Science department, you would use ".cs.tamu.edu".
    Should the host's subtree be restricted [yes|no|/path]: When restricting to a specific host, you may also specify a restrictive subtree on the host. Templeton will not follow links beyond the specified subtree. Entering "yes" will restrict searches to the subtree specified in the initial URL. For example, http://c.gp.cs.cmu.edu:5103/prog/webster has the initial path "/prog". HTML documents not in the /prog directory would not be retrieved. Entering "no" places no restriction on the path, allowing Templeton to wander over the entire web site. Alternately, you may specify a path. This is useful when the starting URL is not the top of the directory tree. (Frequently, a web page may not be reachable from a page "above" it. This "lower" page may still be the "root" of the virtual subtree.)
    Enter maximum depth [0 for unlimited]: This allows you to specify the number of links to follow. '1' will only return the web page specified by the initial URL. '2' will retrieve the initial URL and all links from that page (restrictions permitting). The larger the number, the more levels of indirect links that will be retrieved. Entering '0' will not restrict the number of links. If you are unsure of the number or links you will require, you should enter a finite number, such as '3', '5', or '10'.

    An example response is:

    Enter starting URL: http://www.cs.tamu.edu/people/
    Enter local path ["none" for log files only]: /temp
    Host restriction [yes|no|host|.domain]: yes
    Should the host's subtree be restricted [yes|no|/path]: /people
    Enter maximum depth [0 for unlimited]: 3 

    Passwords

    Templeton supports basic WWW authentication. This consists of a realm, a user name, and a password. The realm is a quoted string provided by the WWW server. To access the protected documents, you must provide a valid user name and password for the realm.
    Password required for realm = "Secret_Project"
      Enter user name: myusername
      Enter password:
    You will not be shown your password as you type.

    If you incorrectly enter your user name or password, Templeton will prompt you to enter them again. If you do not know a valid user name or password, then enter a hyphen "-" for both fields. This will skip the protected URL.

    A note about security: your username and password are not secure. Basic authentication uses a simple encoding scheme -- so simple that many people can actually read the encryted text without a computer! Anyone with a computer between you and the WWW server can view your user name and password and use it. There is not inherent security.

    Files Generated

    Templeton will generate a number of files. The file names use the default index file name and the local saved path (/temp in the above example).
    /temp/mapindex.html An HTML document showing file links on the remote site.
    /temp/locindex.html An HTML document showing file links in the local save-path.
    /temp/host.domain/ Directory of files retrieved from the machine host.domain
    When no local path is specified, only mapindex.html is generated.

    Display

    While Templeton runs, it displays the current progress:
    Current Depth: 2 (3 max)
    Links at current depth: 7
    Total links remaining:  137
    	
    Current URL: http://www.cs.tamu.edu/people/
    Local file:  /temp/www.cs.tamu.edu/people/index.html
      IMAGE: Images/logos/csimage_basic.gif
      LINK:  Images/index.html
      LINK:  people/index.html
      ...  
    This shows that it is currently at depth 2 out of 3. There are 7 links remaining at depth 2, including the current file. Links include HTML documents, images, files, and image-maps. There are currently 137 known links remaining at all depths. If the number of links at the current depth is nearly the same at the total number of links remaining, then there is a good chance that it is nearly done. If there is a large difference, then you have a while to wait.

    The status also shows the current URL being processed and the name of the local file (when the URL is being mirrored). Under the local file are the type and name of all links that are found.

    When all links are processed, the program will end. You may also break out of the program at any time.

    User Commands

    While Templeton is running, the user may perform any of the following commands:
    ? H or h List the available commands
    a or A Add a URL to be processed. You will be prompted for the URL to add.
    i or I Interrupt the current file downloading. When pressed while "reading" a file from a server, the reading will stop and regular processing will continue. When pressed during the processing of a file, the processing is stopped and the next file is retrieved. This can be very useful when Templeton tries to retrieve an undesirable file that is extremely large (or time-consuming).
    l or L List Restrictions. Templeton supports robot exclusion. Typing 'L' shows all known exclusion rules. There are 3 types of rules:
  • FORBID: Templeton cannot go there because the server explicitly said it could not.
  • DENY: Templeton cannot go there.
  • ALLOW: Templeton CAN go there.
  • The DENY and ALLOW rules may be changed (from DENY to ALLOW, etc.) or deleted. By entering 'A' or 'D' you may explicitly enter a URL restriction rule, or you may enter the number of a current rule to modify. Entering 'R' allows you to enter the URL (or number of the restriction rule) to remove. Only DENY and ALLOW rules may be modified or removed.
    s or S Change the sleep interval.
    v or V View the list of URLs to process. These are listed in the order that they will be processed, from top to botton. This list includes images, map files, and documents.
    q or Q Quit Templeton
    x or X Exit Templeton. Currently, there is no difference between quitting and exiting.
    any other key Any other key will pause the system. It is not considered "nice" to pause the system while it is reading from the remote server since you will be pausing a "live" network connection and taking valuable time from the remote WWW server. "Live" connections that are paused for extended durations will be closed by the remote server.

    [Main Menu] [Configuration] [Licensing Agreement] [Registration Information]
    Document revision: 12 Mar. 1997 for Templeton 1.970
    Copyright 1996,1997 N.A. Krawetz
    Modification, republication, and redistribution of this document is strictly prohibited. All rights reserved.